TRAVAUX DE L’INSTITUT DE LINGUISTIQUE DE LUND 47 Perception, Analysis and Synthesis of Speaker Age
نویسنده
چکیده
Speaker age is an important paralinguistic feature in speech which has to be considered in the study of phonetic variation. Knowledge about this feature may be used to improve speech technology applications, e.g. automatic speech recognition and speech synthesis. The present thesis describes six studies of several phonetic aspects of age-related variation in speech. As the speech production mechanism changes from young adulthood to old age, speech is affected in numerous ways. Human perception of speaker age is based on cues such as pitch, speech rate and voice quality, and is fairly accurate. However, it is still unclear which cues are the most important ones. The first study included in this thesis investigated the role of F0 and speech rate (word duration) in age perception. It was found that while these cues may be less important than spectral ones (e.g. formant frequencies), they still correlate with chronological as well as perceived age. In the second study, two stimulus types of various lengths were compared. Results indicated that while longer stimulus duration (regardless of speech type) seems to improve the age estimation of females, spontaneous speech (regardless of duration) appears to contain more important cues for perception of male speaker age. In the next two studies, several automatic estimators of speaker age were built, none of which reached the same accuracy as humans. Important features in machine perception of age were also investigated. It was found that prosodic features seem to be more important in the estimation of female age, while spectral features (e.g. F2) appear to be more important for male age. Although several acoustic correlates of speaker age are known, their relative importance has not yet been established. The next study analysed 161 features, automatically extracted from segments in six words produced by 527 speakers. Normalised means were used to ensure that the features could be compared directly. The most important acoustic correlates of speaker age were identified to be speech rate (segment duration) and intensity range. However, F0 and some spectral measures (e.g. F1 and F2) may also, if used in combination with other features, be important correlates of age. Synthetic speech may sound more natural if speaker age is included as a parameter. The final study developed a research tool which used datadriven formant synthesis and age-weighted linear interpolation to simulate an age between the ages of any two of four female differently aged reference speakers. Evaluation of the tool showed that speaker age may in fact be simulated using formant synthesis. The tool will be used in further studies of analysis by synthesis of speaker age.
منابع مشابه
Buckling Analysis of Embedded Nanosize FG Beams Based on a Refined Hyperbolic Shear Deformation Theory
In this study, the mechanical buckling response of refined hyperbolic shear deformable (FG) functionally graded nanobeams embedded in an elastic foundation is investigated based on the refined hyperbolic shear deformation theory. Material properties of the FG nanobeam change continuously in the thickness direction based on the power-law model. To capture small size effects, Eringen’s nonlocal e...
متن کاملفایل کامل مجلّه مطالعات زبان فرانسه دو فصلنامه علمی پژوهشی زبان فرانسه دانشکده زبانهای خارجی دانشگاه اصفهان
Tâ ÇÉÅ wx W|xâ Revue des Études de la Langue Française Revue semestrielle de la Faculté des Langues Étrangères de l'Université d'Ispahan Cinquième année, N° 8 Printemps-Eté 2013, ISSN 2008- 6571 ISSN électronique 2322-469X Cette revue est indexée dans: Ulrichsweb: global serials directory http://ulrichsweb.serialssolutions.com Doaj: Directory of Open Access Journals http://www.doaj.org ...
متن کاملLa Gestion de la Diversité Linguistique dans les Villes Africaines/Management of Linguistic Diversity in African Urban Cities, Gabriel Mba & Etienne Sadembouo (Eds.). (2012), L’Harmattan, ISBN 978-2-296-99091-3
متن کامل
TRAVAUX DE L’INSTITUT DE LINGUISTIQUE DE LUND 43 Prosodic Phrasing in Spontaneous Swedish
backdrop against which F0 is interpreted linguistically”. Tonal coherence within the prosodic phrase 73 However, Liberman and Pierrehumbert (1984) show that the major factors shaping the F0 contour are local ones, and, consequently, that it is not necessary to assume any time-dependent declination. The downward trend of F0 is explained by a combination of a final lowering effect and the usage o...
متن کاملDe la linguistique descriptive à la linguistique appliquée en côte d’ivoire: analyse et propositions
4 مجله مطالعات زبان فرانسه، سال پنجم، شماره 8، بهار و تابستان 1392 گذار از زبانشناسی توصیفی به زبانشناسی کاربردی در کشور ساحل عاج: تحلیل و راهکارها پییر آدو کواکو کوادیو دانشگاه فلیکس هوفاعه بوانی، آبیجان -کوکودی، ساحل عاج [email protected] 1392/5/ 1391 تاریخ پذیرش: 23 /10/ تاریخ دریافت: 28 این مقاله در نظر دارد تحلیلی انتقادی از کاربستهای زبانشناسی کاربردی در ساحل عاج ارایه کند. در آفریقا، علیر...
متن کامل